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ABSTRACT 

One important objective of a cooperative project between the U.S. Bureau of Cen- 
sus and NASA is to develop the ability to accurately delineate the types of land 
cover in the urban-rural transition zone of metropolitan areas. The application 
of principal components analysis to multidate Landsat imagery is being investi- 
gated as a method of reducing the overlap between residential and agriculturad 
spectral signatures. The statistical concepts of principal components analysis 
are discussed, as well as the results of this analysis when applied to multidate 
Landsat imagery of the Washington, D.C. metropolitan area. 
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A EEDUCTION IN AG./EESIDENTIAL SIGNATUEE 
CONFLICT USING PEINCIPAL COMPONENTS ANALYSIS 
OF LANDSAT TEMPOEAL DATA 


INTEODUCTION 

Eeseaxch pertaining to the use of Landsat imagery has indicated that the four 
band multispectral scanner (MSS) data can be utilized for the delineation of the 
major types of land cover typically found in urbanized areas (Christenson and 
Lachowsld, 1976). The incentive for exploring this particular application of the 
Landsat imagery is that it yields repetitive, synoptic views of major metropol- 
itan areas, providing a means of monitoring urban growth and change on a regu- 
lar basis. In this respect, the Geography Division of the U.S. Bureau of the 
Census is currently involved in a cooperative project with the Earth Eesources 
Branch at NASA's Goddard Space Flight Center. 

One important objective within the context of this project is to develop the ability 
to accurately delineate the types of land cover in the urban-rural transition zone 
in the fringe of metropolitan areas. However, given Landsat's spectral and spa- 
tial resolution, and the diversity of land cover in the urban-rural transition zone, 
results have shown that the spectral signatures for certain residential develop- 
ments are quite often similar to those in areas of agricultural land use. Digital 
classifications based upon such overlapping signatures often result in false 
alarms of residentially classified pixels appearing in agricultural areas. This 
effect is largely due to the heterogeneity over short distances tow u in these 
areas relative to Landsat's spectral and spatial resolution. Consider, for ex- 
ample, the intermingling of roof tops, lawns, streets and wood lots in residen- 
tial areas, surrounded by agricultural lands consisting of row crops, strip crop- 
ping patterns, pastures, fence rows, access roads an.d wood lots. The resulting 
signature confusion between these two different types of land cover must be over- 
come prior to the routine use of Landsat MSS data by agencies such as the U.S. 
Bureau of the Census. 

Several methods of digital processing, including image enhancement, are cur- 
rently being investigated to reduce the problem of ag. /residential signature con- 
fusion. The approach which will be discussed within this report is the applica- 
tion of principal components analysis. This statistical processing method has 
been successfully used to discriminate certain rock and soil types (Podw 3 ’’socki 
et al., 1977), and therefore warranted investigation for urban area delineation. 
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DESCRIPTION OF THE STUDY AREA 


A portion of Prince Georges County, Maryland was chosen as the study area. 

The entire county falls within the Washington, D.C. Standard Metropolitan Sta- 
tistical Area and its selection was governed by: (a) rapid suburban growth in 
recent years; (b) diversity in land use and good representation of a transition 
zone from urban to rural; and (c) the availability of supporting aircraft photog- 
raphy, maps, statistical documentation, and accessibility for ground truth surveys 


DATA SOURCES 

Previous investigations have indicated U’at if seasonal Landsat image sets are 
geometrically registered and merged, the ability to differentiate between cer- 
tain land use categories increases (Williams, 1976; Kan and Dillman, 1975). 

Fall (11 Oct. '72) and spring (9 Apr. *73) Landsat-1 scenes of the Washington, 
D.C. area were being utilized in other phases of the Census Bureau project, and 
thus were chosen for analysis. These two image sets were geometrically reg- 
istered and merged prior to analysis so that each pixel represented the same 
gromid area for the two dates of coverage. Thus, a total of eight spectral vaRies 
were available for each pixel, and this data served as the basis for obtaining the 
training area statistics for each type of land cover. This image set provided a 
basis for evaluating differences in classification results between the two dates, 
between the separate dates and combined dates, or between any of these aiid the 
principal components classification. Any differences in these various classifi- 
cations had to be the direct result of analytic differences since common training 
areas (i.e., groups of pixels) were used. 

PROCESSING OF THE EIGHT CHANNEL MSS IMAGERY 

Computer processing of the geometrically corrected, multidate Landsat imagery 
was accomplished using The Pennsylvania State University ORSER System 
(McMurty et al., 1974). Preliminary processing efforts followed the conventional 
supervised method of analysis for each Individual image date by: (a) selecting 
training areas; (b) computing spectral signatures and related statistics for these 
areas; and (c) classifying the study area using a euclidean distance classifier. 

A comparison of the spectral signatures and their euclidean distance of separa- 
tion for each of the two dates revealed a rather close similarity between resi- 
dential and agricultural signatures. This resulted in considerable misclassifi- 
cation (i.e,, residential pixels occurring in agricultural areas). 



Eight-channel statistics for combined dates were computed for exactly the same 
training areas and the study area was again classified. Some improvement in 
signature separability, and thus in classification results was realized due to the 
fall/spring seasonal differences between the residential woodland and lawns and 
the surrounding agricultural crops. However, unacceptable levels of ag./resi- 
dential confusion remained. 

At this point, it was decided to investigate the potential attributes of principal 
components analysis, applied to the 8-channel multidate image set. The follow- 
ing discussion of principal components analysis is presented so that the reader 
may become familiar with the basic concepts of this statistical approach prior 
to the discussion of the results obtained using this method. 


PEINCIPAL COMPONENTS ANALYSIS 
Background 

Principal components analysis has a long history of theoretical evaluation and 
application in statistics and biometrics dating to the early 1900's (Spearman, 
1904). The emphasis has been on phenomenological interpretations based on 
relationships among the variables. The same analysis, known as the Karhimen- 
Loeve Expansion (Fu, 1968) in the engineering and pattern recognition disciplines, 
has been used predominantly for its dimensionality reduction transformation. 

The two approaches to the same analysis have come together in the field of re- 
mote sensing where interpretation and dimensionality reduction are simulta- 
neously important. 

Principal components analysis of p original, say X, variables determines a lin- 
ear transformation which condenses essentially all of the information in the 
original data into q new, say Y, variables where q is less than p. No distinction 
is made between meaningful variability (information) and random, undesirable 
variability (noise). For this reason, recovery of all variability in order to pre- 
serve the information is the ideal objective of the transformation into the q Y-^ 
variables. However, in most real cases some variability is lost when q is less 
than p. The variability measure is the total variance e.g., the sum of the var- 
laiices for the X- variables, and is determined on the basis of the sums of ob- 
served deviations from the grand sample or population means. Therefore, in 
the ideal case the total variance for the q Y-variables would equal the total vari- 
ance of the pX-variables. 

The transformation is found so that the Y-variables are uncorrelated (orthogo- 
nal), vliereas the X-variables are not. Generally, the greater the correlations 
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among the X-variables, the smaller q will have to be relative to p. This is cer- 
tainly the case for typical MSS imagery where a high degree of correlation often 
exists between selected pairs of channels. If the X-variables were nearly im- 
correlated to begin with, there would be essentially no advantage in the analysis. 
Orthogonality results from (i.e., is a condition of) the analysis. However, the 
unique characteristic of principal components analysis is that the first Y-variable 
has the greatest possible variance associated with it; the second Y-variable has 
the greatest possible variance of the remaining variance and so on. Geometrically, 
the traiisformation corresponds to a rotation of the original axes to new ones which 
are orthogonal to each other imder the conditions given above. Pattern recognition 
and multivariate statistical texts typically show the geometric concepts as well as 
the algebraic relations for principal components analysis (Fu, 1968; Seal, 1964). 


Interpretation 

The eight chamiels (i.e., p equal 8) in the multidate image set yield a total vari- 
ance of 1127. This value is obtained by summing the variances for each of the 
eight channels (Table 1). The total variance is fairly evenly divided between the 
two image dates, with 48.5% in the four channels (1-4) of the October '72 image 
and 51,5% in tlie four channels (5-8) of the April '73 image. According to the 
correlations among channels, it would be expected that a substantial reduction 
in dimensionality could be affected by a principal components analysis (Table 2) , 
Notice the number of strong correlations which are about .8 and greater. 


Table 1 

Statistics for Eight Channel Multidate Landsat Data 
(NOTE: 72,373 pixels in the data set) 



Chamiels 

1 

2 

3 

4 

5 

6 

7 

8 

Mean 

47.6 

36.5 

64.5 

73.3 

62.1 

54.9 

74.5 

77.6 

Variance (E = 1127) 

114.1 

226.8 

90.7 

116.4 

80.9 

132.2 

164.0 

201.5 

Percentage 
Total Variaiace 

10.1 

20.1 

8.0 

10.3 

7.2 

11.7 

14.6 

17.9 
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Table 2 

Correlations Between MSS Channels 


Channel 

Channels 

1 

2 

3 

4 

5 

6 

D 

8 

1 

1.00 








2 

0.96 

1.00 







3 

0.59 

0.61 

1.00 






4 

-0.03 

-0.03 

0.71 

1.00 


. 



5 

0.84 

0.81 

0.44 

-0.10 

1.00 




6 

0.79 

0.80 

0.47 

-0.05 

0.93 

1.00 



7 

0.59 

0.57 

0.58 

0.28 

0.65 

0.53 

1.00 


8 

0.29 

0.26 

0.49 

0.42 

0.32 

0.18 

0.90 

1.00 


Principal components aiialysis did confirm that a substantial dimensionality re- 
duction could be made. Because of the estimation precision brought about by the 
72*, 373 pixels in the data set, all eight Y-variables were significant, but a q of 
four or five would be an acceptable subset. The principal components variance 
and percentage values show that the percentages of the total variance recovered 
when q equals four or five components are 98.2 and 99.0 respectively (Table 3). 

In comparison, the four MSS channels which accotmt for the greatest percentage 
of ihe total variance are 2, 6, 7, and 8, and that percentage is only 64.3 (Table 1). 

The correlations between X-varlables and Y-variables were computed (Table 4) 
and their interpretation follows the same reasoning as for any simple correla- 
tions. In this case, statistical evaluation is uninformative since the estimation 
of the correlations is so precise that even those less thsji .1 are statistically 
significant. For the first principal component, which accoxmts for 59.4% of the 
total variance, all channels are strongly positively correlated, with the excep- 
tion of channel 4 (.8 to 1.1 jum of the Oct, '72 image). This result is typical of 
the first principal component in biometrics applications and citations of a num- 
ber of such examples are available (Seal, 1964, p. 122). The interpretation of 
similar results in the biometrics context is that the first component is a measure 
of over-all size or magnitude. 
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Table 3 

Statistics Resulting from Principal Components Analysis of the 
Eight Channel Multidate Landsat Data 


Principal Component 

a 

2 

3 

4 

5 

6 

7 

8 

Variance 

669.6 

273.4 

118.3 

45.4 

8.3 

5.5 

3.7 

2.5 

Percentage of 
Total Variance 

59.4 

24,3 

10.5 

4.0 

0.8 

0.5 

0.3 

0.2 

Cumulative 
Percentage of 
Total Variance 

59.4 

J 

83.7 

■ 

94.2 

98.2 

99.0 

99.5 

99.8 

100.0 
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Table 4 

Correlations of MSS Channels with Principal Components 


Channel 



Principal Component 

__ . _ { 

1 

2 

3 

4 

5 

6 

D 

8 

0.89 

0.36 

0,06 

0.19 

-0,17 

0.01 

0.04 

-0.08 

0.89 

0.39 

0.10 

0.21 

0.07 

0.05 

-0.02 

0,04 

0.73 

-0,27 

0.59 

0.05 

0,06 

-0.18 

0.05 

-0.00 

0.21 

-0.68 

0,68 

-0.13 

-0.06 

0,10 

-0.04 

0.01 

0.86 

0.34 

-0.14 

-0.29 

-0.17 

-0.06 

-0.01 

0.11 

0.81 

0.42 

0.02 

-0.40 

0.09 

0.04 

0.04 

-0.04 

0.86 

-0.41 

-0.27 

-0.04 

0.01 

-0.04 

-0.10 

-0.04 

0.63 

-0,72 

-0.29 

0.05 

0.01 

0.04 

0.07 

0.02 







































































































The correla-tions of MSS channels with principal components decrease from one 
component to the next, with fewer and fewer standing out against the others. The 
two MSS channels least correlated with the first principal component are most 
strongly correlated with the second component. These are MSS channels 4 and 8, 
the .8 to 1.1 II m band for each date. Principal component three is composed 
mainly of the contrast between the two dates for the two infrared bands (MSS 
channels 3, 4, 7, and 8). Similarly but less evident for the fourth principal com- 
ponent is the contrast between dates for the pairs of visible bands (MSS channels 
1, 2, 5, and 6). Such interpretations can be applied to develop an understanding 
of v^ich characteristics in the data contribute to each component. 

Data Transformation Using the Principal Components 

Transformation of the eight X-variables into five Y-variables accounting for 
99,0% of the total variance was done by using transformation coefficients (Table 5) 
The general form of the equation is the same as a multiple linear regression 
equation, v/ith each of the five Y-variables being computed using the coefficients 
in the corresponding column in the table. For example: 

Y^ = ,236Xi '352X2 - .ISSXg - . 443 X 4 -ISOX^ + .293Xg - .SlTKy - .616Xg 


Table 5 

Transformation Coefficients for First Five Principal Components 


Channel 

Principal Component 

1 

2 

3 

4 

5 

1 

0.3673 

0.2356 

0.0623 

0.2988 

-0.6192 

2 

0.5162 

0.3524 

0.1367 

0.4761 

0.3453 

3 

0.2705 

-0.1545 

0.5160 

0.0697 

0.2140 

4 

0.0896 

-0.4432 

0.6746 

-0.2011 

-0,2193 

5 

0.2984 

0.1860 

-0.1125 

-0.3873 

-0.5239 

6 

0.3578 

0.2928 

0.0172 

-0.6849 

0.3510 

7 

0.4261 

-0.3170 

-0.3234 

-0.0820 

0.0658 

8 

0.3446 

-0.6158 

-0.3722 

0.1137 

0.0396 
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The application of the principal component transformation coefficients reduced 
the dimensionality of each of the 72,373 pixels in the multidate data set from 
eight channels of MSS data, to five oi’thogonal "transformed" axes. There are 
two major justifications for applying such a transformation to the data. These 
are the resulting reduction in dimensionality, and the generation of orthogonal 
coordinate axes. The advantages of this reduction in dimensionality can be sum- 
mai’ized as: (a) a reduction in all subsequent computer processing costs; (b) a 
reduction in the number of black and white and color composite images required 
to visually intexTret the information; and (c) there is some evidence in the liter- 
ature that a reduction in dimensionality improves classifier performance (Marks 
mid Dunn, 1974), The attributes of orthogonal coordinate axes are related to the 
assumption of orthogonality implicit in the euclidean distance classifier which 
was used in this study. 


Classification Results Using the Transfoimied Data 

Using the same training areas defined during supervised classification of the 
MSS data, new "spectral signatui'es" were obtained for each categoxy. These 
transformed signatures were then utilized to reclassify the study area, using a 
common euclidemi distance (i.e., threshold) for all categories. The resulting 
thematic map yielded a noticeable reduction in the number of residential false 
alanns in agricultural areas, while residential areas continued to be properly 
represented. These preliminary observations were documented and verified by 
choosing three Imown agricultural areas where considerable ag./residential con- 
fusion existed using the MSS data. The results of these comparisons sulxstantiated 
a 3 to 1 reduction in the number residential false alarms in these agricultural 
test sites (Table 6), 


Table 6 

Summary of Misclassified Residential Pixels in Agricultural Sites 



Oct. '72 

April '73 

Merged 

Transformed 

Site 1 

13 

6 

13 

6 

Site 2 

7 

10 

10 

1 

Site 3 

50 

27 

29 

10 

Summary Totals 

70 

43* 

52 

17 


*Best MSS False Alanns = 43 vs. 17, or ~ 3 to 1. 
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Using another, more time consuming approach, a detailed comparison of the 
signatures and euclidean distances of separation was conducted for each resi- 
dential and agricultural category of each data set (i.e., Oct ’72, April *73, merged, 
and transformed). Individual thresholds were then calculated to minimize the 
possibility of "confusion" during a reclassification. In this instance, there was 
an 8 to 1 reduction in the number of residential false alarms, with the trans- 
formed data having a total of only 5 false residential pixels in the three test sites. 
These results appear to confirm the expected improvement in the performance 
of the euclidean distance classifier when operating on a. reduced number of or- 
thogonal axes. 


SUMMARY 

Based on the results of this study, the following generalizations can be made. 
Principal components analysis of geometrically corrected, multidate X-.andsat 
imagery seems to be a useful technique for discriminating agricultural and 
residential land cover in the urban-rural transition zone. Eight MSS bands of 
temporal data were utilized to accentuate seasonal variations, and the principal 
components transformation dimensionally reduced the information into orthog- 
onal axes which are highly suited for certain classification algorithms such as 
euclidean distance. Plans are under way to test this technique in other U.S. 
cities, as ag. /residential confusion seems to be a common problem associated 
with Landsat data regardless of geographical locale. 
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